Self-indexed Text Compression Using Straight-Line Programs
نویسندگان
چکیده
Straight-line programs (SLPs) offer powerful text compression by representing a text T [1, u] in terms of a restricted context-free grammar of n rules, so that T can be recovered in O(u) time. However, the problem of operating the grammar in compressed form has not been studied much. We present a grammar representation whose size is of the same order of that of a plain SLP representation, and can answer other queries apart from expanding nonterminals. This can be of independent interest. We then extend it to achieve the first grammar representation able of extracting text substrings, and of searching the text for patterns, in time o(n). We also give byproducts on representing binary relations.
منابع مشابه
Self-Indexed Grammar-Based Compression
Self-indexes aim at representing text collections in a compressed format that allows extracting arbitrary portions and also offers indexed searching on the collection. Current self-indexes are unable of fully exploiting the redundancy of highly repetitive text collections that arise in several applications. Grammar-based compression is well suited to exploit such repetitiveness. We introduce th...
متن کاملIndexing Straight-Line Programs∗
Straight-line programs offer powerful text compression by representing a text T [1, u] in terms of a context-free grammar of n rules, so that T can be recovered in O(u) time. However, the problem of operating the grammar in compressed form has not been studied much. We present the first grammar representation able of extracting text substrings, and of searching the text for patterns, in time o(...
متن کاملFully Compressed Pattern Matching Algorithm for Balanced Straight-Line Programs
We consider a fully compressed pattern matching problem, where both text T and pattern P are given by its succinct representation, in terms of straight-line programs and its variant. The length of the text T and pattern P may grow exponentially with respect to its description size n and m, respectively. The best known algorithm for the problem runs in O(nm) time using O(nm) space. In this paper...
متن کاملFaster fully compressed pattern matching algorithm for a subclass of straight-line programs
We show an efficient pattern-matching algorithm for strings that are succinctly described in terms of straight-line programs, in which the constants are symbols and the only operation is the concatenation. In this paper, both text T and pattern P are given by straight-line programs T and P. The length of the text T (pattern P , resp.) may grow exponentially with respect to its description size ...
متن کاملQuerying and Embedding Compressed Texts
The computational complexity of two simple string problems on compressed input strings is considered: the querying problem (What is the symbol at a given position in a given input string?) and the embedding problem (Can the first input string be embedded into the second input string?). Straight-line programs are used for text compression. It is shown that the querying problem becomes P-complete...
متن کامل